model extrapolation
EpiCoDe: Boosting Model Performance Beyond Training with Extrapolation and Contrastive Decoding
Tao, Mingxu, Hu, Jie, Yang, Mingchuan, Liu, Yunhuai, Zhao, Dongyan, Feng, Yansong
The remarkable performance of Large language models (LLMs) relies heavily on the availability of abundant high-quality training data. However, the high cost of acquiring annotated data often prevents models from obtaining capabilities to tackle downstream tasks. In this paper, we introduce a novel method, EpiCoDe that boosts model performance in data-scarcity scenarios without extra training. We first employ model extrapolation to enhance a finetuned model with its inferior version, and then adopt contrastive decoding to further reduce predicted errors, by comparing the logit scores given by the extrapolated and the vanilla finetuned model. Experiments across three tasks over four different LLMs show that EpiCoDe consistently outperforms existing methods with significant and robust improvement. We also propose a new theoretical framework to reveal the mechanism behind contrastive decoding in data-scarcity scenarios, which further helps us better understand the effectiveness of EpiCoDe.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Canada > Ontario > Toronto (0.04)
Extrapolation Merging: Keep Improving With Extrapolation and Merging
Lin, Yiguan, Xu, Bin, Li, Yinghao, Gao, Yang
Large Language Models (LLMs) require instruction fine-tuning to perform different downstream tasks. However, the instruction fine-tuning phase still demands significant computational resources and labeled data, lacking a paradigm that can improve model performance without additional computational power and data. Model merging aims to enhance performance by combining the parameters of different models, but the lack of a clear optimization direction during the merging process does not always guarantee improved performance. In this paper, we attempt to provide a clear optimization direction for model merging. We first validate the effectiveness of the model extrapolation method during the instruction fine-tuning phase. Then, we propose Extrapolation Merging, a paradigm that can continue improving model performance without requiring extra computational resources or data. Using the extrapolation method, we provide a clear direction for model merging, achieving local optimization search, and consequently enhancing the merged model's performance. We conduct experiments on seven different tasks, and the results show that our method can consistently improve the model's performance after fine-tuning.
- North America > United States (0.29)
- Africa > Rwanda (0.14)
- Asia > China (0.14)
- Europe > Austria > Vienna (0.14)
On Model Extrapolation in Marginal Shapley Values
As the use of complex machine learning models continues to grow, so does the need for reliable explainability methods. One of the most popular methods for model explainability is based on Shapley values. There are two most commonly used approaches to calculating Shapley values which produce different results when features are correlated, conditional and marginal. In our previous work, it was demonstrated that the conditional approach is fundamentally flawed due to implicit assumptions of causality. However, it is a well-known fact that marginal approach to calculating Shapley values leads to model extrapolation where it might not be well defined. In this paper we explore the impacts of model extrapolation on Shapley values in the case of a simple linear spline model. Furthermore, we propose an approach which while using marginal averaging avoids model extrapolation and with addition of causal information replicates causal Shapley values. Finally, we demonstrate our method on the real data example.
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France (0.04)
Marginal Effects for Non-Linear Prediction Functions
Scholbeck, Christian A., Casalicchio, Giuseppe, Molnar, Christoph, Bischl, Bernd, Heumann, Christian
Beta coefficients for linear regression models represent the ideal form of an interpretable feature effect. However, for non-linear models and especially generalized linear models, the estimated coefficients cannot be interpreted as a direct feature effect on the predicted outcome. Hence, marginal effects are typically used as approximations for feature effects, either in the shape of derivatives of the prediction function or forward differences in prediction due to a change in a feature value. While marginal effects are commonly used in many scientific fields, they have not yet been adopted as a model-agnostic interpretation method for machine learning models. This may stem from their inflexibility as a univariate feature effect and their inability to deal with the non-linearities found in black box models. We introduce a new class of marginal effects termed forward marginal effects. We argue to abandon derivatives in favor of better-interpretable forward differences. Furthermore, we generalize marginal effects based on forward differences to multivariate changes in feature values. To account for the non-linearity of prediction functions, we introduce a non-linearity measure for marginal effects. We argue against summarizing feature effects of a non-linear prediction function in a single metric such as the average marginal effect. Instead, we propose to partition the feature space to compute conditional average marginal effects on feature subspaces, which serve as conditional feature effect estimates.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Italy > Marche > Ancona Province > Ancona (0.04)
- (3 more...)
- Overview (1.00)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)